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One of the sources of gravitational waves for the proposed space-based gravitational wave de- 
tector, the Laser Interferometer Space Antenna (LISA), are the inspirals of compact objects into 
supermassive black holes in the centres of galaxies — extreme-mass-ratio inspirals (EMRIs). Using 
LISA observations, we will be able to measure the parameters of each EMRI system detected to very 
high precision. However, the statistics of the set of EMRI events observed by LISA will be more 
important in constraining astrophysical models than extremely precise measurements for individual 
systems. The black holes to which LISA is most sensitive are in a mass range that is difficult to probe 
using other techniques, so LISA provides an almost unique window onto these objects. In this paper 
we explore, using Bayesian techniques, the constraints that LISA EMRI observations can place on 
the mass function of black holes at low redshift. We describe a general framework for approaching 
inference of this type — using multiple observations in combination to constrain a parameterised 
source population. Assuming that the scaling of EMRI rate with black hole mass is known and 
taking a black hole distribution given by a simple power law, dn/dlnA/ = Ao(M/M*) a ° , we find 
that LISA could measure the parameters to a precision of A(hij4o) ~ 0.08, and A(ao) ~ 0.03 for a 
reference model that predicts ~ 1000 events. Even with as few as 10 events, LISA should constrain 
the slope to a precision ~ 0.3, which is the current level of observational uncertainty in the low-mass 
slope of the black hole mass function. We also consider a model in which Aq and ao evolve with 
redshift, but find that EMRI observations alone do not have much power to probe such an evolution. 

PACS numbers: 



I. INTRODUCTION 

The proposed Laser Interferometer Space Antenna (LISA) [T] will be sensitive to gravitational waves from systems 
containing supermassive black holes in the 1O 4 M0-1O 7 M0 range. This mass range is very hard to probe electromag- 
netically, and only five such systems have been robustly identified from dynamical measurements [2J, including the 
black hole in the centre of our own galaxy. Placing constraints on the mass function of low-mass black holes has, 
however, key astrophysical implications. 

It has been suggested [3] that the shape of the mass function of massive black holes at the low-mass end is a key 
diagnostic to derive constraints on the mechanism that formed black hole seeds. Seed black holes are predicted to 
form in the mass range ~ 100 — 1O 5 M , depending on the specific physical process involved [3j. As black holes grow 
from low-mass seeds, it is natural to expect that at least some black holes have not grown efficiently and still trace 
the initial conditions. Clearly, black holes at the high-mass end of the mass function have increased their mass by 
accretion, or they have experienced mergers and dynamical interactions. Any dependence of the black hole mass on 
the initial seed mass is erased. However, the distribution of low-mass black holes still retains some "memory" of the 
original seed mass distribution. The expectation is that ungrown seeds produce a peak in the mass function that 
corresponds to the peak of the seed mass function. The higher the efficiency of seed formation, the more pronounced 
is the peak. 

Additionally, the mass function of low-mass black holes can provide insights on the co-evolution of black holes and 
their hosts. Observations show that the masses of black holes correlate with the mass, luminosity and the stellar 
velocity dispersion of the host [3J and references therein] . These correlations imply that black holes evolve along with 
their hosts throughout cosmic time. Lauer et al. [4] suggest that at least some of these correlations break down at the 
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largest galaxy and black hole masses [but see [5] . One unanswered question is whether this symbiosis extends down 
to the lowest galaxy and black hole masses [6] , due to changes in the accretion properties [7] , dynamical effects [8] , or 
cosmic bias [5]. 

Since current measurements of black hole masses extend barely down to Mbh ~ 10 6 M Q , these features cannot 
be observationally tested with present data. LISA observations will significantly improve our understanding of the 
astrophysics of black holes in this mass range. LISA will detect mergers between supermassive black holes with these 
masses out to very high redshift, which will probe the early assembly of these systems and their host galaxies. LISA 
will also detect gravitational waves generated when compact objects (white dwarfs, neutron stars or black holes) 
are captured by and inspiral into supermassive black holes in the centres of galaxies [TU], which are referred to as 
extreme-mass-ratio inspirals (EMRIs). The EMRI events will mostly be at low redshift, z < 1, and can therefore be 
used to probe the quiescent population of ~ 1O 4 M -1O 7 Mq black holes that remain today. In this paper, we focus on 
this second type of source and explore what constraints LISA might be able to place on the low redshift population 
of black holes in this mass range. 

A typical EMRI event will have frequency ~ lmHz and will be observed by LISA for ~ 1 year, and so we will detect 
~ 10 5 cycles of the waveform. This allows LISA to make very precise measurements of the parameters of the host 
system QTJ [T2] , and it is hoped that EMRI observations can be used to carry out high precision tests of the spacetime 
geometry of the central object [TUJ and references therein]. For a typical EMRI event with signal-to-noise ratio (SNR) 
of ~ 30, we would expect to recover the redshifted mass of the central black hole to a precision A ln((l + z)M) ~ 10~ 4 
and the distance to the source to a precision of ~ 3%. The spin of the central black hole, the redshifted mass of 
the inspiralling object and the orbital parameters (initial radius and eccentricity) should also be measured to very 
high accuracy, A In A ~ 10 _4 -10 -3 . While such precise measurements for individual systems are interesting and very 
important if the data is to be used for high precision tests of relativity, astrophysically it is the statistics of the set of 
EMRI events that LISA observes that will be of greatest value in constraining models. It is this application of LISA 
EMRI observations that we focus on in the current paper. 

The distribution of events that LISA detects will depend on three factors — the number density of possible source 
systems; the rate at which EMRIs occur in systems with particular parameters; and the sensitivity, in terms of 
distance reach, of the LISA detector to particular types of EMRI event. The last effect can be estimated theoretically 
in advance of the LISA mission, so we concentrate on what LISA can tell us about the first two effects. A particular 
model for the black hole distribution and the rate of inspirals per black hole does not precisely predict the number 
of events that LISA will observe, since inspirals start stochastically in any given galaxy. However, a particular model 
does predict the rate at which observable EMRIs of particular type will occur, and hence the probability distribution 
of the observed events. The LISA observation is a sample from this distribution, and we wish to infer the parameters 
of the underlying model from this sample. This can be done with a simple application of Bayes Theorem. Bayesian 
techniques have been employed widely in the context of gravitational wave data analysis [see, for instance. 1131 IT3] . 
but this has been primarily to make statements about the parameters of individual sources. We can also use Bayesian 
methods to make statements about the underlying population from which the sources we detect are drawn and it is 
that which we will do here. Using LISA supermassive black hole merger events to constrain astrophysical models was 
considered in |15j . The focus of that work was to derive an "error kernel" which would map the intrinsic distribution 
of source parameters onto the observed distribution of source parameters, and for model selection they used a variant 
of the Kolmogorov-Smirnov test. Our work differs from that approach not only in the fact that we consider EMRIs, 
but in the use of a Bayesian framework for the analysis and a parameterised model for the underlying distribution 
that we wish to constrain. 

LISA will be able to measure the product of the EMRI rate per black hole with the number density of black holes, 
but it is not clear if it will be able to decouple the two effects. The scaling of the EMRI rate per black hole with the 
black hole parameters can, in principle at least, be constrained in advance through numerical simulations. One such 
analysis was carried out in [16 . Various assumptions were made in that analysis which may not be valid for black 
holes in the LISA mass range, and we will discuss these issues further in Section II B However, for the purpose of 
this work we will assume that the black hole rate scaling is known, and is given by the results in [16j . This allows 
us to interpret our results in terms of the underlying distribution of black hole masses. An alternative interpretation 
that does not rely on this assumption is that we are constraining the convolution of the black hole number density 
with the EMRI rate per black hole. 

The parameters of individual sources will not be measured perfectly by LISA, due to noise in the detector and 
confusion from other sources in the data stream. It is possible to include parameter errors consistently when making 
statements about the population, and we will describe how this is done in practice in Section |II C| However, this 
requires having properly sampled posterior distributions for all of the sources in the data set. An alternative approach 
is to bin the sources according to their maximum likelihood parameters. Although sources may end up in the wrong 
bins due to the parameter errors, the majority of sources will be correctly classified. We use the binning approach 
in this paper, since it allows us to assess LISA's ability to constrain the black hole population without requiring the 
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computationally expensive simulation of LISA noise and recovery of posteriors for hundreds of EMRI sources. We will 
demonstrate that our conclusions are not affected by the inclusion of reasonable parameter estimation errors, which 
indicates that our results are an accurate reflection of what will be achievable in practice. 

In this paper we take a single power law model for the black hole mass function and consider both a redshift- 
independent case of the form dn/dlnA/ = Ao(M/M*) a °, and a redshift-dcpcndcnt case of the form dro/d In M = 
Aq(1 + z) 1 {M / M*) aa ~ aiZ . In the former case, we find that LISA will be able to measure the parameters to precisions 
A(lnA ) ~ 0.08 and A(a ) ~ 0.03, and in the latter case to precisions A(lnA ) ~ 0.2, A(a ) ~ 0.06, A(A X ) ~ 0.7 

and A(ai) ~ 0.2. These precisions scale with the number of observed events like N oh J , and have been normalised 
to a reference case that predicts ~ 1000 events. We find that changing our assumptions about the performance of the 
LISA mission affects these conclusions only through the change in the number of events predicted. We also find that 
the precisions are somewhat improved if the black holes in the EMRI systems tend to have large spins. 

The paper is organised as follows. In Section [IT] we describe the theoretical framework that we employ in this 
analysis. This includes a discussion of Bayes Theorem, a description of the probability distribution for EMRI events 
that LISA will detect, the proper treatment of parameter measurement errors and a summary of existing constraints 
on the shape of the black hole mass function. In Section |III[ we present our results for both the redshift-independent 
and redshift-dependent models. This section includes a discussion of the effect of measurement errors and variation 
in the bin size used in the analysis. We also demonstrate how the results change as we vary the true black hole 
population in the Universe and as we change our assumptions about the performance of the LISA detector and the 
spin of the black holes. Finally, in Section [IV| we summarise our results and discuss directions for future research. 



II. THEORETICAL FRAMEWORK 



A. Bayesian inference 

Bayes' Theorem provides a way to infer the posterior probability distribution of the parameters, A, of a model based 
on a hypothesis H, given some observed data D 

? p{D\X,H)p{X\H) 
P{>\D,H) = , (1) 

where p(X\D,H) is the posterior probability distribution, p(D\X, H) is the likelihood, p(X\H) = n(X) is the prior 
probability on the model parameters and p(D\H) is the evidence for the hypothesis H given the data. The evidence 
plays a key role in model selection, but it enters Bayes' theorem as a normalisation constant that is independent of 
the model parameters, so we can ignore it if we are interested only in the posterior probability distribution. 

In traditional applications of Bayesian methods to problems in gravitational wave data analysis, we are trying to 
infer the parameters of the waveform based on the observed data from our detector. The uncertainty comes from the 
noise distribution in the detector, and the likelihood is based on the spectral density of instrumental noise. What 
we are interested in here is not the parameter estimation problem for a single source, but in inference about the 
underlying galaxy population based on the set of sources that we have detected. In that case, the uncertainty comes 
from the fact that a particular galaxy population does not predict uniquely the events we will see, but only the rate 
at which events occur. We will use Bayesian inference to quantify the statements we can make about the properties of 
galactic black holes, based on the number of EMRI sources of each type that LISA sees. The probability distribution 
for the set of events will be discussed in the next section. 



B. LISA event probability distribution 

We now turn to the question of modelling the likelihood, p(D\X, H). The data, D, consists of the parameters of 
all of the events that LISA has observed. These parameters will not be measured exactly, but from the analysis of 
the LISA data stream we will obtain a posterior probability distribution for the parameters of each event, and for 
the number of events in the data set. We will discuss how these errors may be included in the analysis in the next 
section, but for the following discussion we shall assume that the parameters of each event are known exactly. For 
clarity in the following description, we also assume that the parameter space of possible events has been divided into 
a finite number of bins. Results can also be obtained straightforwardly in a continuum analysis by taking the limit 
as the bin sizes tend to zero. This limit will be discussed at the end of this section. 

In a binned analysis, the data is the number of events in each of the bins. The number of events is determined 
by two things — the number of events with particular parameters that occur in the Universe, and the sensitivity of 
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LISA to events with those parameters. In a given galaxy, EMRIs with particular parameters will finish, in a plunge 
into the central black hole, at a certain rate. We can model the number of plunges that occur in a certain time (or 
the number of EMRIs that start in a certain time window) as a Poisson process with mean equal to that rate. It is 
the fact that any given model predicts only the rate of EMRIs that gives rise to uncertainty. Only inspirals that have 
started in the right range of times will be detected by LISA, so the probability distribution of observed events follows 
this Poisson distribution. The rate for systems with particular parameters, A, is the product of the number density 
of galaxies with those parameters, N(X), with the intrinsic rate at which EMRIs start in such galaxies, 1Z(\). 

The sensitivity of LISA can be characterised by a completeness function, C(X). A given event will only be detected 
if it has sufficient signal-to-noise ratio (SNR) to be identified in the LISA data. The completeness is the probability 
that an event with particular parameters is detected and will depend on the data analysis pipeline. It can either be 
computed in advance or by using the cleaned LISA data stream via numerical injection. One reasonable model for 
C(A) is that all events with SNR p > p t h will be detected 100% of the time, and those with p < p t h will be detected 
0% of the time. This is a good model if an SNR cut is imposed on events included in a follow-up analysis of the type 
described here. One of the parameters in A is the time remaining until plunge, t p \, at the start of the LISA observation. 
For fixed values of the other parameters, an SNR cut defines an allowed range of values for t p \. If a given EMRI is 
too close to plunge at the time when LISA starts taking data, then insufficient SNR will be accumulated before the 
object plunges for a detection. Conversely, if the EMRI is too far from plunge, the gravitational radiation will be 
weak, and so even over the full lifetime of LISA the SNR accumulated will be insufficient for detection. This range of 
times defines the observable lifetime of a source, r(A). Using the SNR cut model for completeness, we can eliminate 
i p i as a parameter and replace C(A) by r(A). The function r(A) can be computed in advance, and depends only on 
the properties of the detector. It was computed for circular, equatorial inspirals in [17) . using an SNR threshold of 
Pth = 30. We will use this simplified model for the completeness in the present paper. With a more complicated 
completeness function, we can still eliminate the time-to-plunge parameter by defining an effective observable lifetime 
as an integral of the completeness function over t p i, t = C(X)dt p 

Typically, a given bin will contain events from several galaxies, but these galaxies will behave independently and the 
sum of two independent Poisson distributions is a Poisson distribution with mean equal to the sum of the means. The 
number of events in a given bin is therefore determined by a Poisson process with mean equal to the expected number 
of events in that bin under the particular model. We describe the observed data by a vector n = {n\,n2, ...,uk) of 
the number of events ni in each bin i. The probability of the data under a model H depending on parameters p is 



K^ )g) =n (2) 

2 — 1 

where the model-dependent rate, r 2 , in a given bin is equal to the integral of the rate over the bin, B{ 

n{p)= I n{\\p)k{\\p)t{\)&\ (3) 



As before, N(X\pT) denotes the number density of galaxies, such that J B N(X\p)dX is the number of galaxies that 
contribute to the events in bin i when the model parameters are p. In practice, we work with log-likelihoods, so that 
the total log-likelihood is equal to the sum of the log-likelihoods for each bin. More generally, r(A) can be replaced 
by C(A) and t p \ included in A. 

The completeness and observable lifetime depend only on the detector, but the intrinsic rates N(X\p) lZ(X\p) depend 
on the properties of the underlying black hole distribution through p. It is this latter that we will try to measure 
using EMRI observations. The dependence of the EMRI rate in a given system, 7l(X), on black hole properties is 
somewhat uncertain. Hopman [16] found that 1Z(X) varied with the mass of the central black hole, M, as 

/ M x-0.15 

K = ^OGyr" 1 - . (4) 

The derivation of this equation assumed (i) an isothermal (cuspy) distribution of stars around the black hole, and 
that (ii) the M — a relation holds at all galaxy and black hole masses. However, if low-mass black holes (1O 4 M < 
M < 10 5 Mq) are hosted in dwarf galaxies, which are analogues of the Milky Way satellites [TS], one or both of these 
assumptions may break down. In particular, the central region of such galaxies is dominated by dark matter, and the 
density displays a shallow core. The M — a relation is unknown at low galaxy and black hole masses, which means that 
there is uncertainty in assumption (ii) , but there is no robust alternative model that could be used insted. The validity 
of assumption (i), the density profile, can be assessed using the observational properties of galaxies. If the central 
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density of stars, rih, is constant then we can express the sphere of influence of the black hole as = 3M / (Anm* nh), 
where m* is the typical stellar mass. The normalisation of the EMRI rate per galaxy (Equation 8 in [16]) scales then 
as n^M 1 / 4 instead of M^ 1 / 4 . Studies of the demography of black holes in low-mass galaxies will be beneficial to 
improve our understanding of the mass scaling of EMRI rates. Since we can hope that observational and theoretical 
work will improve our understanding of 7?.(A) before LISA flies, for the purpose of this work we will take 7?.(A) as 
known, and given by Eq. Q . We will therefore present the results in terms of how well LISA can measure the mass 
function, N(X), which is rather poorly constrained for black holes in the LISA range. However, the uncertainties in 
the scaling of 7?.(A) must be borne in mind when interpreting the results. 

Continuum limit 

If we take the limit where the size of all the bins tends to zero, then equation ^ becomes 

p(Z\p,H)=e- N »l[r(\ i \fl) (5) 

i=l 

where A = {A 1; A2, • • • , Ajv } is the vector of measured parameter values, Aj, for source i, N is the number of observed 
events and iV M is the expected number of events under the model parameters ft. This continuum limit is of the form 
that we would expect — we have a product over all of the observed events of the probability that the event would be 
seen under the model parameters fl, with a weighting factor of e - ^ which penalises models that predict too few or 
too many events. 

In the analysis for this paper we will use the binned event distribution, as it makes the generation of data and the 
inclusion of parameter errors more straightforward, and we will check that the size of bins used does not significantly 
affect the results. The final analysis of LISA data will most likely use the continuum limit, but the broad conclusions 
about the precision with which LISA will be able to measure the properties of the black hole distribution should not 
be sensitive to the use of binning. 



C. Allowing for parameter measurement errors 



In the preceding section we assumed that we took the parameters measured by LISA at face value when assigning 
sources to bins. However, noise in the detector has the effect that the measured parameters for a source, A OJ will differ 
from the true parameters, At- If the error distribution is narrow compared to the size of the bins then it should be safe 
to ignore parameter errors in the analysis, as the majority of events will be assigned to the correct bin. This is one 
advantage of using binning. We will look at this further when we discuss our results in Section |III| In the continuum 
limit, the approximation of ignoring errors will also be OK provided that the rate predicted by the model does not 
change significantly over the typical width of the measured parameter distribution. 

If the error distribution is broad compared to the scale over which the predicted rate varies, then parameter errors 
must be folded into the analysis, as outlined below. 



Continuum analysis 



The treatment of errors in the continuum limit is relatively straightforward theoretically. Our data is the output 
of the detector, s, which is a combination of N signals, hi, with parameters Ai and instrumental noise, n. The 
probability that we would observe this data given our model and model parameters jl is the integral over all possible 
parameters values, {A^}, for the sources, of the product of the probability that the model Universe would yield these 
events (equation J5|) with the probability that the noise in the detector would equal s — hi, i.e., 



p(D\fl,H)=e 



-JV„ 



p n 



N 



d fc A!d fe A 2 ---d fc A 



N, 



(6) 



where we use k to denote the dimensionality of the parameter space for a signal. The first term inside the square 
bracket is the usual posterior probability distribution for the source parameters as computed from the LISA data. If 
the sources are well separated in parameter space, this is the product of the posterior probabilities for each individual 
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source, but in the case of confused sources this is a multi-source posterior. We note that the above equation takes the 
form of an integral over the multi-source posterior probability distribution, which is exactly the type of problem for 
which Monte Carlo integration was designed. If we assume that the posterior pdf has been computed using Markov 
Chain Monte Carlo (MCMC) techniques, using appropriately uninformative priors, then the above integral can be 
evaluated directly as a sum over the MCMC samples of the rate for the sample parameters (Eq. J5])). This follow-up, 
including errors, can therefore be done relatively cheaply once the initial source identification and characterisation has 
been completed. To be consistent, the posteriors on the parameters of each individual source should be obtained by 
integrating over the parameters, p,, of the model for the Universe. These posteriors will therefore have to be recomputed 
in post-processing and will depend on both the particular population model used and on the other sources present 
in the data set. However, this calculation can also be done cheaply without repeating the MCMC sampling and, in 
practice, if the rate function varies slowly over the size of a typical error box for an EMRI event, then the posteriors 
should not change significantly. The rate function r(Xi\fl) plays the role of a prior on the waveform parameters A;. 
In sampling literature the parameters that determine this prior are usually referred to as hyperparameters and any 
prior on /2 as a hyperprior. 

In the above we have assumed that the number of sources is fixed and known. In practice, the survey will not be 
complete and we will have some distribution over the number of events that we have detected. The expression is 
similar in that case, but the integral will also extend over the number of observed events, N Q . 

While this work was being completed, we became aware of work on a related problem being carried out con- 
currently [IH] in the context of LIGO. The focus in [TU] was on the evaluation of the integral ^ including error 
distributions on the source parameters. Here we concentrate on what we can learn about a particular model for 
the underlying population of black holes harbouring EMRIs. While it will be important to include parameter mea- 
surement uncertainties in the actual analysis of LISA data, the details of these errors should not affect our general 
conclusions about what LISA will be able to achieve. For this reason we ignore errors in our analysis, but show that 
the inclusion of errors in the generation of the source population does not significantly alter the conclusions. We leave 
a fuller analysis to a future paper. 

2. Binned analysis 

In a binned analysis, parameter errors will cause sources to appear in the wrong bin. Errors can be folded into 
the analysis in two ways, depending on whether the data is binned according to the intrinsic source parameters, or 
according to the observed parameters. 

a. Binning according to intrinsic parameter values A LISA observation of a given event will give a maximum 
a-posteriori set of parameters, and a distribution of the possible parameter values. Each event can then be assigned 
to one of a number of bins, with certain probabilities. If this is done for each event in turn, we obtain a set of possible 
intrinsic event distributions with relative probabilities. Each distribution can be analysed and a sum of the posteriors 
can be computed, weighted according to these probabilities. This approach has the advantage that it makes use of the 
actual measured errors for each source, and therefore gives proper relative weighting to the events. The disadvantage 
is that the number of possible intrinsic event distributions grows rapidly with the number of observed sources, N Q . If 
each source can spread into ~ TVb bins, the number of possible distributions that we would have to consider would 
grow like . 

b. Binning according to observed parameter values Binning according to observed parameter values produces a 
single data set to analyse. The probability distribution of observed events can be computed using the expected error 
distribution for a source with given parameters A. We show in appendix [X] that when an underlying Poisson process 
with mean (3 spreads into multiple bins, indexed by i, with known probabilities, Pi, then the observed distribution of 
events in each of the bins is drawn from an independent Poisson process with mean pij3. This result is straightforward 
to derive and probably well known, but we include the result in the appendix for completeness. This property means 
that we can fold parameter errors into the analysis just by modifying the rate in each bin. In [TS], the LISA "error 
kernel" for supermassive black hole mergers was defined in essentially the same way. The disadvantage of this approach 
is that it assigns average errors to the events, and does not make use of the measured error distributions. For this 
reason, we advocate using the continuum analysis when accounting for parameter measurement errors, as the posterior 
pdf required to evaluate the integral ^ will have already been computed. The continuum analysis also makes full use 
of all the information available in the LISA observation. In the present work, we employ the binning analysis because 
posterior pdfs are not readily available. 
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D. Black hole mass function 



We require a model for the mass function of black holes, N(X), in the mass range 10 4 M Q < M < 1O 7 M , where 
LISA is sensitive to EMRI events. In this mass range the total mass function of black holes, including both active 
and quiescent systems, is poorly constrained. The mass function of quiescent black holes is estimated in the literature 
by coupling the empirical correlations found between black hole mass and host properties (bulge mass, luminosity 
and velocity dispersion, see [21 [2Ql and references therein]) with the distribution of galaxies as a function of these 
properties: 

dn dn dX 

d~M = d~Xd~M' ( ' 

where X is either the velocity dispersion, a, the bulge mass, Mb u i g e, or the bulge luminosity Lbulge- 

However, most of the quiescent black holes masses that led to the determination of these correlations between M and 
X are above 10 7 M Q . Additionally, even assuming that the correlation between black hole mass and bulge luminosity 
holds at all masses, the luminosity function of galaxies is also limited at faint fluxes, adding to the uncertainty [see 
the detailed discussion in 121]. 

For instance, [22] estimate the velocity dispersion function for early-type galaxies in the Sloan Digital Sky Survey, 
in a flux limited sample. In order to obtain the complete distribution, including both early and late type galaxies, 
it is necessary to take several steps (convert the luminosity function of galaxies which are not early types into a 
distribution of circular velocities, and then convert circular velocity to velocity dispersion using empirical correlations), 
as measurements of the velocity dispersion are available only for galaxies which are early-types. 

Among the three properties mentioned above (er, A/buigo ^buige) the best constrained galaxy distribution comes 
from the luminosity function (the number density of galaxies per unit Lbuigc)- Extrapolating both the M — L^ u i gc 
correlation, and the luminosity function of galaxies, [21] determine a black hole mass function that appears to be flat 
at M < 3 x 1O 6 M . This is likely an upper limit to the mass function at low black hole masses. [21] compare the 
mass function of quiescent black holes thus determined to the mass function of active black holes (clearly, a lower 
limit to the total black hole mass function). They find that the mass function of active black holes turns over at 
M < 3 x 1O 6 M0, leading to a decreasing number density of small black holes. EMRIs could therefore be critical to 
constrain the total mass function of black holes (active and inactive) at low masses. The mass function at M < 1O 5 M 
is also a critical test for models of black hole formation [51 [IS] . 

In this paper we adopt the mass function of 'inactive' black holes given in Greene and Ho [211 long-dashed line in 
their Figure 11a]. It is relatively flat for low black hole masses, but turns over at M ~ 10 7 M Q and then decreases for 
larger black hole masses. Such a functional form is well fit by a simple ansatz of the form 

dn A{M/AQ a 

\°) 



dlogAf 1 + B{M/M*Y ' 

We find that with M» = 3 x 1O 6 M , the best fit to the Greene and Ho data has parameters 

A = 0.00284 Mpc" 3 , a = 0.0311, B = 0.041, /3 = 1.5105. (9) 

If we force the slope to be flat at low masses by setting a = 0, we instead find the parameters 

A = 0.00269 Mpc~ 3 , a = 0, 5 = 0.031, /3 = 1.5262. (10) 

In Figure [l] we show the black- hole mass function from Greene and Ho for inactive and active galaxies along with the 
two fits given above. We also show the black hole mass function obtained from the Sloan Digital Sky Survey for early- 
type galaxies, and the corresponding mass function inferred for all galaxies via the procedure described above [22) . 
These curves are included to illustrate the degree of uncertainty in the literature about the mass-function in the LISA 
range. It is clear that a relatively precise measurement of the slope of the mass function would significantly enhance 
our understanding of the low-end of the mass function. 

If LISA could probe all black hole masses, we could hope to determine all four parameters in a fit of the form pi) 
from LISA observations. However, LISA is only sensitive to gravitational waves from black holes in the range 10 4 Af Q < 
M < 10 7 A/ Q . It is clear from Figure [l] that in this range the mass function is relatively flat. For that reason we 
choose to adopt a simpler prescription for the mass function in the range of interest for LISA, namely a simple power 
law d?i/dlog Af cx M a . 




FIG. 1: Mass function of black holes taken from Greene and Ho |21| and Sheth et al. |22| , divided into type as labelled. The 
dashed cyan curves are fits to the Greene and Ho data, of the form given by the ansatz in Eq. Q. We show the two fits 
described in the text — one with all four parameters allowed to vary freely and one in which the function was forced to be 
flat for low masses by setting a = 0. The two fits lie almost on top of each other, but the curve that is slightly higher for 
M = 1O 7 M is the fit with a / 0. 



E. Markov Chain Monte Carlo 



To recover the posterior probability distribution using Eq. Q we make use of Markov Chain Monte Carlo techniques 
(MCMC), which provide an efficient way to explore posterior probability distributions. For the redshift-independent 
model we consider below, it is relatively straightforward to evaluate the posterior directly without using Monte Carlo 
integration. However, for higher dimensional problems the gain from MCMC methods becomes increasingly significant. 
We used MCMC methods to obtain all of the results reported in this paper to demonstrate the general framework. 

The procedure for MCMC is to construct a sequence of points in the parameter space {xi}. The initial point, xq, 
is chosen at random from the prior. At a subsequent iteration, i, a new point y is drawn from a proposal distribution 
q(jj\xi) and we evaluate the Metropolis-Hastings ratio 

R= K{y)p{D\y,H)q{xj\y) 
ir(xi)p(D\xi, H)q(y\Xi) 

A random sample, u, is then drawn from a uniform distribution u £ t/[0, 1] and if u < R, the move is accepted and we 
set Xi+i = y, otherwise the move is rejected and we set — x"i- If R > 1, the move is always accepted, but if R < 1 
there is still a chance that the chain will move to the new point. If the points in the sequence are generated via this 
algorithm, then the distribution of points visited by the chain follows the posterior posterior probability distribution 
of the parameters. 



III. RESULTS 



In order to simplify the calculation for this first analysis, we make various assumptions about the parameter space 
of EMRIs. We assume that all EMRIs are circular, equatorial inspirals into black holes with a fixed spin and that 
prograde or retrograde inspirals are equally likely. We also ignore sky position and orientation dependence of the LISA 
response, and use position and orientation averaged SNRs. These assumptions allow us to use the observable lifetime 
functions tabulated in [17] and mean that the only parameters that we are interested in for each event are the mass 
of the black hole, M, and the redshift of the source, z. While this considerably simplifies the current analysis, the 
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framework outlined in the previous section does not rely on these assumptions and so the present analysis could be 
easily extended to include more parameters as required. In |17j results were presented for three different values of the 
fixed spin of the central black holes, a = 0, 0.5, 0.9, and for two different sets of assumptions about the performance 
of the LISA detector. The "optimistic" assumptions were that the LISA mission lasted for 5 years and had full 
functionality for the whole time. With full functionality, the LISA response is equivalent to two independent right- 
angle interferometers, but if there is a failure on the satellite LISA can still operate with the equivalent response of a 
single right-angle interferometer. The "pessimistic" assumptions were that LISA was operating in this failure mode 
for the whole mission, and the mission only lasted for 2 years. We will take a = and the optimistic LISA initially, 



but will describe how the results change when we take a = 0.9 or the pessimistic LISA in Section III A 4 



As described in the previous section, each LISA measurement will have an error in the mass and redshift determi- 
nation. These can be folded into the analysis, but for this first computation we ignore the errors and assume that the 
parameters are measured perfectly. The typical error we expect in a measurement of the redshifted mass, M(l + z), is 
O(10~ 4 ), which is almost certainly ignorable. However, the redshift must be inferred from the distance, in which we 
expect an error of O(10 -2 ), and this propagates into a comparable error in M . Further error arises from uncertainties 
in the cosmology used to map between the measured luminosity distance and the redshift. Over a large population 
of events, we expect these errors to average out, and they should also be unimportant if the scale over which the 
event rate varies is larger than the scale of the typical errors. These arguments justify the omission of errors in our 
first analysis, but we will verify in Section |III A 1| below that our results do not change significantly when errors are 
included in the generation of the observed event distribution. 



A. Redshift-independent mass function 



As a test case, we took the mass function of black holes in the LISA mass range, 1O 4 M < M < 1O 7 M0 to be 

dlogM =Ao Uj ■ ^ 



II B 



using 



with Af* = 3 x 10° Mq. We generated data sets from the probability distribution described in Section 
the values A = 0.002Mpc~ 3 and a = 0. These data sets were then explored using MCMC techniques to recover 
posterior probability distributions for Ao and ao- For a two dimensional problem of this nature, MCMC techniques 
are not necessary for posterior recovery, particularly when the likelihood takes such a simple analytic form. However, 
numerical integration is still required to marginalise the posteriors and MCMC techniques provide a quick and efficient 
method for doing that. Using MCMC methods in this simple problem also allowed an easier extension to more 
complicated models, such as the redshift-dependent mass function that we will describe later. The recovered ID 
posterior distributions for a typical case are shown in Figure [2j and the corresponding 2D posterior (correlation plot) 
is shown in Figure [3] These posteriors are typical of what we would obtain from analysis of the LISA data. Both 
A and ao will be measured relatively precisely, and we see that the true parameters, A = 0.002Mpc -3 , ao = 0, 
are consistent with the mean and width of these posteriors. Figure [3] indicates that Ao and ao are correlated. This 
correlation corresponds to keeping the number of black holes with M ~ 5 x 1O 5 M0-1O 6 M0 approximately constant. 
LISA is most sensitive to events of this type, and any black hole distribution that predicts approximately the correct 
number of such events will be reasonably consistent with the data. 




0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 
An 




FIG. 2: Distributions of the mass function parameters Ao (left) and ao (right) that could be inferred from the set of LISA 
events. The solid green lines are the best-fit Gaussians to In Ao and ao, obtained via fitting a quadratic to the logarithm of the 
distribution. 
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FIG. 3: Correlation between the values of Ao and ao inferred from the set of LISA events. 



We find that the posterior distributions for both hiA and ao are well fit by Gaussians, which can be found by 
using a linear least squares fit of a quadratic to the logarithm of the distribution. The best-fit Gaussians are also 
shown in Figure [2] These Gaussian fits provide a useful way to characterize different realisations of the set of EMRI 
events, drawn from the same underlying galaxy black hole distribution. For a given realisation we can compute the 
posterior, obtain a Gaussian fit of the form £?exp{— (x — ^jl) 2 /2a 2 } and record the mean, /x, and standard deviation, 
(7 of the Gaussian. We can also compute the "error" in the mean, namely (/i — X)/a, where X is the value of the 
parameter (A or a) used to generate the underlying distribution. We would expect the true value to lie within one or 
two standard deviations of the mean, and so this last quantity should be small. In Figure [4] we show the distribution 
of these three quantities for both A and ao over 100 realisations of the measured EMRI distribution. 

The distribution of the Gaussian means resembles the posterior of the parameter in each case, and the distribution 
of the errors looks approximately Normal. This is what we would expect, since we are using the same model to 
construct the data set and to analyse the data. The more interesting quantity is the standard deviation, since this 
is a measure of the accuracy to which we can measure the parameters of the underlying distribution. We see that 
the distribution of standard deviations is fairly narrow for both InAo and ao, and we deduce that, for this choice of 
Aq and ao, we would be able to measure In Ao to a precision of 0.08, and ao to a precision of 0.025. In subsequent 
sections when we consider the effects of measurement errors, bin size and different choices for Aq and ao, we will use 
a to characterise the measurement precision. 



1. Inclusion of parameter errors 

As mentioned earlier, the mass and redshift of each source that LISA detects will be subject to measurement errors. 
In Section II C[ we described how these errors could be included in the analysis in a consistent manner, given properly 



sampled posterior probability distributions (pdfs) for each of the detected sources. In the current analysis, we wish 
to avoid the overhead of recovering such pdfs and have been taking the parameter values of each source at face value. 
In order to verify that our results are insensitive to these errors, we also carried out an analysis in which an error 
was added to the parameters (M and z) of each source after they were generated, but before they were assigned to a 
particular bin. We then carried out the analysis of the data set including errors in the same way as before. 

Following [23], we took the error in the redshift to be A(lnz) = 0.05z for a source at redshift z. The error in the 
rcdshiftcd total mass of the source, M z , will be small |TT] in all cases, but we need the intrinsic mass of the source 
which is M z /(1 + z). The error in redshift will therefore propagate into the mass also, and so we included a mass 
error of the same magnitude, A(lnM) = 0.05z. The errors for each source were drawn from Gaussians with these 
widths. Over 100 realisations of the LISA event distribution, we computed Gaussian fits to the posteriors. We found 
that the distribution of the standard deviation, mean and "error" of these fits were indistinguishable from the results 
presented in Figure [4] when measurement errors were ignored. In Figure [5] we show the distribution of the standard 
deviations found in this way. These results indicate that it is reasonable to ignore parameter errors when analysing 
the results in this way and so in all subsequent parts of this paper we will ignore parameter errors when generating 
the observed EMRI distribution. 



11 



0.0016 0.0017 0.0018 0.0019 0.002 0.0021 0.0022 0.0023 0.0024 0.0025 0.076 0.078 0.08 0.082 0.084 0.086 0.0 




-2 -1.5 -1 -0.5 0.5 1 1.5 2 2.5 3 



-0.08 -0.06 -0.04 -0.02 0.02 0.04 0.06 0.08 0.0245 0.025 0.0255 0.026 0.0265 0.027 0.0275 





I 


1 


H 




- \ 



FIG. 4: Distribution of the mean (left) and standard deviation (middle) of Gaussian fits to the posterior for InAo (upper row) 
and ao (lower row) over 100 realisations of the set of observed EMRI events. In the right hand figures in each row we show the 
distribution of the "error" in the mean of the distributions, i.e., AAo = (fj, — Aq)/<j or Aao = (fi— ao)/a. 
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FIG. 5: Distribution of the standard deviation of a Gaussian fit to the posterior for In A) (left) and Qo (right) over 100 
realisations of the set of observed EMRI events, but now including measurement errors in the construction of the observed 
event set. 



2. Dependence on bin size 



In the preceding section we took bins of particular size in mass and redshift space, but to verify the consistency of 
the results we explored how things changed as we varied the size of the bins used. We repeated the analysis using 
bins of one half and one quarter of the width used in the original analysis. In Figure [6] we show the distribution 
of standard deviations of the best-fit Gaussians when using these smaller bin sizes. These do not show particularly 
significant differences from the original analysis, but the typical widths of the posteriors are slightly larger when using 
smaller bin sizes. This arises because the number of events in a typical bin is smaller when the bin size is reduced, so 
if the bins are too small, the generation of the LISA event distribution that we search tends to be noisier. If the bins 
are too large then we start to lose the ability to discriminate models. Through experimentation we found bin sizes of 
8z ~ 0.1 and <51nM ~ 0.2 worked well and we used these bin sizes to obtain the results presented elsewhere in this 
paper. 
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FIG. 6: Distribution of standard deviations of best-fit Gaussians to the recovered posterior over 100 realisations of the set of 
observed EMRI events for ln^lo (top plots) and Qo (bottom plots). These results were obtained using bin sizes one half (left 
panels) or one quarter (right panels) of the bin size used in the original analysis. 



3. Dependence on parameters of observed distribution 

To explore the dependence of the error in the recovered parameters on the parameters of the underlying distribution 
from which the observed events are drawn, we repeated the computation for a range of choices of Aq and a$. In Figure[7] 
we show how the typical parameter errors vary with these choices. The parameter errors were estimated by averaging 
the standard deviations of the best-fit Gaussians over 10 realisations of the observed EMRI events. If the relative 
number of different types of events are equal, then the total number of events observed, N \^ s , plays the role of the 
signal to noise ratio squared, and we expect the error in the recovered parameters to scale with one over the signal- 
to-noise ratio. Therefore, we would expect the accuracy with which we can measure the parameters to scale roughly 

— 1/2 

as N ohs . The parameter Aq is an overall normalisation and so we would expect the change in the number of events 
to completely explain the dependence of the error estimate on this parameter. The slope, otQ, also affects the relative 
numbers of events of different types and therefore there could be other effects in that case. We can factor out the 
dependence on iV^hs by plotting the errors as a function of the number of events. In Figure|S]we replot all of the data 
shown in Figure V7j but now with N ^ s on the horizontal axis. We also show best-fit lines of the form kN~^J 2 . The 
value of iV bs for each case can be obtained from fits given in [T7] . 

It is clear from Figure [8] that the change in N b s between different models explains virtually all of the change in 
the precision with which we can determine the model parameters. The best-fit curves give us the nice rule of thumb 
that A(lnAo) ~ 0.08y / 1000/./V o b s and A(ao) ~ 0.025 -^/XOOO/iVobs when we make optimistic assumptions about LISA 
and assume that all the central black holes have spin a = 0. In the next section we will see what happens when we 
relax the latter two assumptions. 



4- Dependence on LISA performance and black hole spin 



All of the preceding results have used optimistic assumptions about the LISA detector and assumed that all the 
black holes into which EMRIs are falling have spin a = 0. To relax these assumptions we need only to change the 
observable lifetime function, since we are not including spin as a parameter in the model. In |17j . observable lifetime 
functions were also given for a pessimistic LISA configuration, and for a case in which all black holes had spin a = 0.9. 
In Figure [9] we show how the precision with which we can measure the parameters of the underlying population change 
when we use these different assumptions. We plot the results for optimistic LISA assumptions and spin a — for 
comparison. 
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FIG. 7: Typical parameter estimation error as function of the parameters of the underlying distribution. We show the error in 
In Aq (left panel) and in ao (right panel) as a function of the value of ceo, for various choices of Aq, as labelled in the key. 



0.25 



< 

< 



0.15 




0.05 



1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 
N 




1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 
N 



FIG. 8: The same results that are shown in Figure [7] but now plotted as a function of the total number of events that would 
be observed for the particular model considered. The left panel shows the error in InAo, and the right panel shows the error 
in ao- We also show best-fit curves of the form kN . . 



Once again, we find that much of the dependence on Aq and ao can be explained by the change in the number 
of observed events that the model predicts. We can find best-fits to the dependence of the errors on N ^ s for 
each of the differ ent sets of assumptions. As for the a = 0, optimistic LISA case, this dependence is well fit by 
A(\nA ) = fcyioVlOOO/iVobs and A(ao) = k aa ^/l000/A o bs- The parameters fc^ and k ao for all cases are given in 
Table [I] It is clear that the difference between optimistic LISA and pessimistic LISA comes primarily from the change 
in the number of events we are likely to observe, but if black holes tend to have significant spins we will be able to 
determine the model parameters more precisely, even with the same number of observed events. This arises because 
LISA can see spinning black holes over a wider range of redshifts and masses. 



B. Redshift dependent mass function 



As an extension of the model, we modified the simple ansatz described above to allow for simple evolution of the 
mass function with redshift 
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FIG. 9: As Figure [7j but now showing results for various assumptions about LISA and about the spin of the central black holes 
of the EMRI sources. In all cases we have taken A — 0.002Mpc -3 and different curves correspond to different assumptions, as 
labelled in the key. Cases labelled "spin" assumed all black holes have spin a = 0.9. 





a = 

Optimistic LISA Pessimistic LISA 


a = 0.9 

Optimistic LISA Pessimistic LISA 


k Ao 


0.08 0.084 
0.025 0.028 


0.054 0.053 
0.021 0.022 



TABLE I: Parameters describing how the uncertainty in the measured parameters varies with the number of observed events. 
As described in the text, the dependence is well fit by A(X) = kx\/W00/N o t, s . This table contains the kx parameters for the 
two different sets of assumptions about LISA and the two choices of black hole spin, a. 



with M* = 3 x 1O 6 M0 again. In this case, for our reference data set we took Aq and ao to have the same values as the 
non-evolving case, and additionally chose A\ = a\ = 0. In other words, we took the real Universe to have no evolution 
in mass-function, but allowed the model to include mass function evolution. The distributions in Ai and ot\ then tell 
us how much evolution there would have to be for LISA EMRI observations to detect it. The recovered distributions 
for these four parameters are shown for a typical example in Figure 10 and the corresponding 2D posteriors are 
shown in Figure [ll] We can fit Gaussians to the distributions of InAo, A\, ao and ol\ in this case as well. The 
best fit Gaussians are also shown in Figure [10] It is clear that the Gaussian approximation again provides a good 
estimate of the mean and width of the distribution and hence the precision to which we can measure the various model 
parameters. As in the redshift-independent case we find that the true parameters are consistent with the recovered 
posteriors in all cases, and there are strong correlations between the model parameters. The interpretation of these 
correlations is the same as before, i.e., that they correspond to keeping the number of events of the type to which 
LISA is most sensitive approximately constant. 

In Figure 12 we show the distribution of the Gaussian widths over 100 realisations of the set of observed EMRI 
events. We do not show the Gaussian means or the "errors" as the former were consistent with the recovered 
posteriors and the latter were consistent with a Normal distribution, as we would expect and as we saw in the redshift 
independent case. Unsurprisingly, we find that the addition of extra parameters in the model decreases the precision 
to which each of the model parameters can be measured, with typical errors in ItiAq of ~ 0.2 and in ao of ~ 0.055. 
The errors in the evolution parameters are typically ~ 0.75 for A\ and ~ 0.2 for a.\. These latter errors are quite 
large, which suggests that we would not be able to say with any confidence that there was an evolution in the mass 
function. This conclusion will depend on the actual amount of evolution, but these results suggest that the true \A\\ 
would have to be at least 0.75 or the true |ai| would have to be at least 0.2 in order for a redshift evolution to be 
apparent from the data. 
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FIG. 10: As Figure [2| but now for the redshift dependent mass function described in Eq. dl3p . The four panels show, from left 
to right, the posterior distributions for Aq, Ax, qq and ct\. 




FIG. 11: As Figure [3] but now for the redshift dependent mass function described in Eq. fll3B. The four panels show the 
correlations for Ao/A~\ (top left), Ao/oto (top middle), Ao/otx (top right), Ai/oeo (bottom left), Ai/cti (bottom middle) and 
oto/ai (bottom right). 



1. Dependence on parameters of observed distribution 



As in the redshift-independent case, it is informative to consider how the precision to which we can determine the 
parameters varies with the true parameters of the Universe. For this investigation, we chose to keep A\ = ax = 
in all cases. This allows us to interpret the errors in A\ and u\ as the amount of evolution that would have to exist 
for the LISA EMRI observations to be manifestly inconsistent with a non-evolving mass function. In Figure [13] we 
show how the typical errors in all four parameters vary as we change Aq and ao- We see that the error in A± is 
typically ~ 0.5-1, and the error in a\ is typically ~ 0.1-0.4. There would therefore have to be a very significant 
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FIG. 12: Distribution of standard deviations of the best-fit Gaussians to the posteriors, computed over 100 realisations of the 
set of observed EMRI events. The plots are for \iiAq (top left), ao (top right), A\ (bottom left) and a.\ (bottom right). 
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Optimistic LISA Pessimistic LISA 


a = 0.9 

Optimistic LISA Pessimistic LISA 




0.2 


0.15 


0.15 


0.13 




0.056 


0.043 


0.044 


0.041 




0.72 


0.64 


0.30 


0.38 




0.21 


0.24 


0.092 


0.15 



TABLE II: As Table [T] but now for the redshift-dependent mass function. The dependence of the parameter uncertainty on 
the number of observed events is of the form A(X) = kx ^/lOOO/A^bs, for the kx values tabulated here. 



change in the mass function between z = and z = 1 for it to be measurable. These results lend further support 
to our previous conclusion that we will not be able to measure an evolution in the black hole mass function through 
EMRI observations alone. 

Once again, much of the dependence of the errors on the underlying model can be explained by changes in the 
total number of observed events that the different models predict. All of the data shown in Fi gure [l~3| ca n be 
well approximated by A(\nA ) w 0.20^/lOOO/AUs, A(a ) « 0.056^1000/^^, A(^i) « 0.72 V^OO/^ote and 
A(o-i) « O^Ia/IOOO/AUs. As before, these results were found under the optimistic LISA assumption and assuming 
that all black holes have spin a = 0. We now consider what happens when we relax these assumptions. 



2. Dependence on LISA performance and black hole spin 

As before, we can explore how these results change when we vary the assumptions about LISA and about the 
black hole spins by modifying the observable lifetime function used in the analysis. We consider pessimistic LISA 



assumptions, and the case where all black holes have spin a = 0.9, as we used in Section III A 4 for the redshift- 
independent case. In Figure [14] we show how the parameter errors vary under these various assumptions. The 
parameter errors are once again well fit by functions of the number of observed events of the form A(lnA ) = 

fcAoVlOOO/JVob., A(a ) = fc Qo Vl000/AU s , &(Ai) = k Al y/1000/N oba and A(a x ) = k ai a/1000/AW The various k x 
parameters are summarised in Table [TTJ As in the redshift-independent case, there is not much difference between 
optimistic and pessimistic LISA assumptions, other than in the number of events detected, but measurement precisions 
are somewhat improved if black holes are spinning. 
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FIG. 13: Typical parameter estimation errors as a function of the parameters of the underlying distribution. We show the 
error in \nAo (top left), ao (top right), A\ (bottom left) and Qi (bottom right) as a function of the values of Aq and ao of 
the underlying distribution. We keep Ai = ai = for all cases. The value of ao is on the x-axis, while each line represents a 
different choice for Ao, as labelled in the key. 



IV. CONCLUSIONS AND FUTURE WORK 



We have explored the ability of the proposed space-based gravitational wave detector, LISA, to probe the properties 
of black holes through observations of extreme-mass-ratio inspiral events. We have presented a general framework for 
addressing such questions, and considered a particular special case in which we imagine that the events observed by 
LISA are divided into bins in mass and redshift. Assuming that the only model uncertainty is in the unknown number 
density of black holes in the LISA range, 1O 4 M0 < M < 10 7 M Q , and taking a simple, non-evolving, power-law model 
for the black hole mass function, dn/dlnM = Ao(M/M*) a °, we conclude that LISA will be able to measure the 
amplitude of the mass function to a precision A(lnj4o) ~ 0.08 and the slope to a precision A(arj) ~ 0.03. These 

— 1/2 

precisions scale with the number of observed events like N ohs and have been normalised to a reference model with 
N bs ~ 1000. The present uncertainty in the slope of the mass function in the LISA range is of the order of ±0.3, so 
LISA will beat this with as few as 10 detected events. 

If we allow for the mass function to evolve with redshift, using the ansatz dn/dlnM = A (l + z) Al (M/M*) a °~ aiZ , 
but assume the Universe has a non-evolving mass function with A\ = a\ — 0, we find LISA will be able to measure 
the parameters of the distribution to a precision A (In Ao) ~ 0.2, A(a ) ~ 0.06, A(Ai) ~ 0.7 and A(ai) ~ 0.2. These 

— 1/2 

again show a scaling like iV obs ' and are normalised to a black hole distribution that predicts 1000 events. The errors 
in the evolution parameters, A\ and a.\, are sufficiently large that we must conclude that EMRI observations alone 
will be unable to detect evolution in the black hole mass function. 

The work presented here has made use of various simplifications, which we now discuss. 
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FIG. 14: As Figure [13] but now showing results for various assumptions about LISA and about the spin of the central black 
holes of the EMRI sources. In all cases we have taken A — 0.002Mpc~ 3 and different curves correspond to different assumptions, 
as labelled in the key. 



A. EMRI parameter space 



We have considered measurements of the central black hole mass and the source redshift only, and have taken all 
EMRIs to be circular, equatorial inspirals into black holes with the same spin. We also used signal-to-noise ratios that 
were averaged over sky position and orientation when assessing the detectability of different sources. This allowed 
us to use the observable lifetime functions given in [17]. It will be important to extend this work to generic EMRIs, 
which will require the extension of the observable lifetime calculation to generic orbits. In general, the addition of 
extra parameters to a model tends to decrease the precision to which the model parameters can be measured, as we 
saw when we allowed the parameters of the mass function to vary with redshift. However, in this case, we will not only 
be adding additional parameters, but additional measurements, since LISA EMRI observations will also measure the 
black hole spin, orbital eccentricity etc. to high precision [IT]. We would therefore not expect our conclusions about 
the precision to which we can measure the black hole mass function to change significantly. We have already seen that 
we can measure the slope of the mass function to higher precision if black holes tend to be more rapidly spinning, as 
the presence of spin increases the range in mass and redshift within which EMRIs can be detected. Eccentricity may 
have a similar effect, as the additional waveform harmonics that arise due to eccentricity will enter the LISA frequency 
band earlier, thus tending to enhance the signal-to-noise ratio. These effects must be quantified in the future. 

The most important consequence of including new parameters will be the ability to ask additional astrophysics 
questions. EMRIs will have considerable power to probe the spin distribution of black holes in the appropriate mass 
range at low redshift. Measurements of the masses of the inspiraling objects and the orbital eccentricities will provide 
constraints on the processes occurring in stellar clusters in the centres of galaxies, such as mass segregation and the 
efficiency of the various channels that lead to EMRIs [10] . It is important to understand what LISA will be able to tell 
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us about these various processes and how correlations could affect our ability to address these questions independently. 

B. Data analysis technique 

There are various assumptions in the analysis technique described here that could be relaxed. We have ignored 
parameter errors, other than to verify they did not significantly affect our results. This was possible because we 
were using binning of the observations before analysing the data. We have described how errors can be included 
properly and how the data can be analysed in a continuum limit, but this requires having properly sampled posterior 
pdfs for the EMRI sources. While we do not anticipate that our results will change significantly under such an 
analysis, it might be possible to explore this using pdfs for EMRI sources constructed in the context of the Mock 
LISA Data Challenges ■ Additionally, the construction of the observable lifetime used here assumes that the LISA 
observation is 100% complete for sources with SNR > 30, and 0% complete for sources with SNR< 30. The exact LISA 
completeness function will depend on the algorithms employed to carry out data analysis, which are rather uncertain 
at present |25l I26j . Our present model is a reasonable approximation if an SNR cut is used for source selection for the 
follow-up. It is unlikely that LISA data analysis preparation will reach a point where we will have a better model for 
the completeness anytime soon, but it will be straightforward to recompute the effective observable lifetime once one 
is available. Finally, our analysis has used the same model, based on a Poisson probability distribution, to generate 
the data sets we have searched and for the posterior construction. This is likely to be a good model for the EMRIs 
occurring in a given galaxy, but there are various complicating factors, since it takes some time for galactic centres 
to become relaxed after galaxy mergers. Although the galaxies in which LISA can detect EMRIs are at low redshift 
and therefore are unlikely to have undergone recent mergers, the Poisson model could be checked by using numerical 
simulations, of the type described in [IB]. We are also assuming that all systems of a given type have the same EMRI 
rate. While there will certainly be an average rate for systems of particular type, there may be rate "noise" which we 
have ignored here. This could be included in the generation of the sets of events to search, but we would require a 
reasonable model for the amount of noise to include. 



C. EMRI rate uncertainties 

One significant uncertainty in these results is in the correct interpretation of what we are measuring. As discussed 
earlier, the rate of EMRIs of a particular type depends on both the number density of black holes, and on the intrinsic 
rate of inspirals per black hole. We assumed that the second of these was known, using results in |16| . and therefore 
that the EMRI observations were telling us about the number density of black holes. In practice, we will be measuring 
the convolution of these two effects to the precisions quoted here. As discussed in Section [II B[ observations of low- 
mass galaxies and further theoretical work should better inform our understanding of the EMRI rate per galaxy and 
the level of our uncertainty in it before LISA flies. However, there will be some residual uncertainty in the model 
assumptions, and there may be correlations between the rate and the black hole mass, or even other parameters such 
as spin, which might not have been included in the models. LISA is unlikely to be able to tease apart these two 
effects directly, which must be borne in mind when the results are interpreted. The use of other observations, either 
of gravitational waves or in electromagnetic wavebands, might help with the final interpretation. 

D. Future applications 

The work described in this paper has illustrated the potential power of LISA observations for studying the astro- 
physics of black holes in the range 1O 4 M < M < 10 7 M Q . We have focussed on EMRI observations as an illustration, 
but the same approach can be used for the interpretation of LISA observations of mergers between supermassive black 
holes (SMBHs). SMBH mergers and EMRIs probe different subsets of the same population of black holes — SMBH 
events will probe the mergers between black holes up to the highest redshift and earliest cosmic times, while EMRIs 
will probe the whole population of black holes (not necessarily active or merging) at z < 1 that are the end products 
of such SMBH mergers. Models that predict the rates of LISA SMBH mergers will therefore also make predictions 
for the number density of these low redshift black holes that play host to EMRIs. Thus, it should be possible to 
derive more powerful constraints on the models from combined observations. In particular, it might be possible to 
use SMBH mergers at lower redshift in conjunction with EMRI observations to place constraints on the evolution of 
the black hole mass function with redshift, which we have seen is not possible using EMRI observations alone. This 
should be explored in more detail in the future. 
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Appendix A: Event distribution allowing for parameter errors 

Here we consider the effect of parameter errors on the distribution of observed events. For events in a given bin we 
assume that the error distribution, P(A Q ; A t ), is known, so that, by integrating this probability distribution over each 
bin, wc can compute the probabilities, pi, that, when observed by LISA, the event appears in bin i. If the process 
occurring in the original bin is a Poisson process with mean /3, then we claim that the observed data is a set of 
independent Poisson processes occurring in each bin, with mean (3pi in bin i. This can be seen as follows. We denote 
the number of observed events in bin i by rij. The probability that we see a given set of observed events {n^} is equal 
to the probability, Pi, that the Poisson process generates a total of N = JT rn events during the observation window, 
multiplied by the probability, P 2 , that N events would be distributed as {ni}. The first of these probabilities is 

P 1= p N e - /N\, (Al) 

while the second is 



n 



(A2) 
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in which 



( 



rn 



n 



) 



= n\/((n — to)! to!) is the number of ways to choose to objects from n. This probability is the 



probability that particular events fall into particular bins multiplied by the number of ways in which the events could 
be ordered into the bins to give the correct total number of events. Expanding out the second term we find that P 2 
reduces to 



where we make use of the fact that ^nPi = 1- This is just the product of the probabilities for Poisson processes with 
means /3pi in each bin. 

There will be many intrinsic bins that contribute to a given observed bin, but since the processes occurring in each 
galaxy are independent we can use the standard result that the distribution of the sum of two independent Poisson 
processes with means (3\ and (3 2 is a Poisson process with mean fli + /?2- We deduce that the number of observed 
events in each bin is drawn from an independent Poisson process, with mean given by the average over the intrinsic 
rates, weighted by the probability distribution of the errors in the measured parameters. 



P2 = Nl Ti 




(A3) 



and hence the total probability P = Pi x P 2 becomes 




(A4) 



